Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

towards an even better backward attention kernel #179

Merged
merged 4 commits into from
Apr 19, 2024

Conversation

ngc92
Copy link
Contributor

@ngc92 ngc92 commented Apr 19, 2024

Back to the drawing board, because I think the other kernels hit a local minimum, or at least the way the loops where organized made it very difficult to think about how to optimize this further.

I think there is quite a bit of room to optimize this version further, but for educational purposes, I think it is better to have a simpler version first, where the main idea is evident, and then go crazy with the optimizations in a second version.

This design limits T to multiples of the block_size. Incidentally, on my system at least, block size 64 turns out to be fastest, which actually is compatible with both our training at test scripts.

@@ -560,6 +561,128 @@ __global__ void softmax_autoregressive_backward_kernel5(float* __restrict__ dpre
}
}


// I want `BlockSize` to be statically known to the compiler, thus we get a template here.
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

love this comment block

@karpathy karpathy merged commit b556ad9 into karpathy:master Apr 19, 2024
@ngc92 ngc92 deleted the even-better-attention branch April 28, 2024 08:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants